Features for segmenting and classifying long-duration recordings of "personal" audio
نویسندگان
چکیده
A digital recorder weighing ounces and able to record for more than ten hours can be bought for a few hundred dollars. Such devices make possible continuous recordings of “personal audio” – storing essentially everything heard by the owner. Without automatic indexing, however, such recordings are almost useless. In this paper, we describe some experiments with recordings of this kind, focusing on the problem of segmenting the recordings into different ‘episodes’ corresponding to different acoustic environments experienced by the device. We describe several novel features to describe 1-minute-long frames of audio, and investigate their effectiveness at reproducing hand-labeled ground-truth segment boundaries.
منابع مشابه
Forced alignment for speech synthesis databases using duration and prosodic phrase breaks
Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break ...
متن کاملAutomatically segmenting and clustering minimal-impact personal audio archives
To capture essentially everything that you hear takes little more than a $100 MP3 player with a built-in microphone; a year’s worth of recordings is maybe 60 GB, or a small stack of writable DVDs. We have been collecting this kind of ‘personal audio’ on and off for a couple of years, and experimenting with methods to index and access the resulting data. Audio archives have several distinctive f...
متن کاملRevealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation
Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecolo...
متن کاملCode-Copying in the Balochi Language of Sistan
This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...
متن کاملVoice pathology detection and classification using MPEG-7 audio low-level features
In this paper, a new pathological voice detection and pathology classification method based on MPEG-7 audio lowlevel features is proposed. MPEG-7 features are originally used for multimedia indexing, which includes both video and audio. Indexing is related to event detection, and as pathological voice is a separate event than normal voice, we show that MPEG-7 audio low-level features can do ver...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004